Caching at Different Layers
Learn how cache works on different layers of web applications.
We'll cover the following
Introduction#
When designing the API, we tend to focus on optimizing it with respect to performance. Let’s assume that multiple users want to access a Twitter trend and want to see the Tweets related to it. In response to these multiple network calls, the server retrieves the Tweets related to the trend from different databases and delivers them to each client. Next time, the client will experience the same delay for each request, including those made multiple times. The delay occurs due to HTTP's stateless nature, network calls, server computations, and so on.
This is illustrated in the slides below:
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
Therefore, we require a mechanism that efficiently handles the issues mentioned above and reduces the client latency. For this purpose, we can use a cache.
What is a cache?#
A cache is temporary memory that stores frequently reusable API responses. For instance, the cache stores the trends object when it’s retrieved for the first time by the client. If there isn’t any new trend and the same client refreshes or revisits the page, it’s returned from the cache instead of the server. Therefore, the cache significantly reduces the delay and recomputation on the server's side. The flow of the HTTP request-response is given below using the cache:
The client sends the HTTP request to the server, which is served by the intermediate cache.
If the cache does not have the requested object, the request is forwarded to the back-end server.
The server sends the response to the browser and saves the copy of the requested object in the cache for subsequent requests.
Since the cache is essentially a copy of the data in the origin of truth (database), an outdated/stale version of data in the cache is a known issue. In order to achieve consistency between the cache and the server, we need to update the cache at regular intervals. To evict stale entries and replace outdated entries with fresh ones, various eviction strategies, such as LRU, LFU, and MRU are employed.
Point to Ponder
Question
What should be the optimal size of the cache?
The cache size depends on the application’s requirements. For example:
- Number of users using the application
- Size and/or type of the data in the application
- Nature of the application—for example, read heavy or write heavy
- Cache access patterns—for example, sequentially or randomly access the content
Now we know what a cache is and why it is needed. Data consistency between caches and servers is another detailed topic, but it's beyond the scope of this course. See the Spectrum of Consistency Models lesson in the Grokking Modern System Design Interview for Engineers & Managers course for a discussion on data consistency.
Next, we’ll explore exactly where we can put the cache in our application infrastructure.
Caching at different layers#
From the description above, it may seem like caching at the server end would be adequate. However, that is not always the case. We can use caching at different layers in end-to-end communication to reduce latency. Here, we’ll mainly talk about caching at three different layers: client, middleware, and server, as shown in the illustration below.
Client-side layer: This layer identifies the caching types on the client-side devices.
Web browser cache: The browser simply checks if the required resources are locally available and returns the response. These resources are related to HTML, CSS, and other multimedia files required to build a website. Also, utilizing local data is usually faster than the other choices—for example, when the client has requested the data, and it takes several seconds to respond due to a slow Internet connection. The next time, the browser utilizes the local data without going to the network and responds within milliseconds.
Middleware layer: This layer identifies caches on the network between client and server networks.
Internet service provider (ISP): The ISP mainly maintains two cache types. The first is the Domain Name System (DNS) cache to reduce the DNS query latency. The second is the proxy server that sits in the middle of the client and origin server.
DNS: The main job of DNS is the resolution from the domain name to the IP address. The DNS resolver takes multiple round trips to different servers in order to get an IP address for the requested domain. The DNS caching caches the top-level domains and helps the DNS resolver to return the IP address with low latency.
CDNs: This is a large-sized cache that is used to serve numerous clients requesting the same data. CDNs mostly provide static objects to the closest clients.
Server-side layer: This layer reduces server-side burden by using in-memory caching systems.
API gateway cache: This cache stores responses to frequently requested API calls to avoid the recomputation of the same results. On similar subsequent requests, the response is served from the cache instead of downstream calls. It can store any data that can be transmitted over HTTP. The API gateway ensures the request is the same as the previous one by analyzing its request headers and query parameters. The API gateway does not have to worry about analyzing the request payload because we cached only
GETrequests for the most part.Web server cache: The web server cache stores the most frequently requested static web pages. In case of dynamic data, the request is forwarded and handled by the application server.
Application server cache: Normally, the data is stored on the database, and fetching the data from the disk takes much more time than the RAM. This layer stores the frequently accessed data objects in different formats, and multiple custom caching solutions can be used on the application server, such as DynaCache.
Database cache: The database cache is used to store the responses of queries that take time to execute and are frequently called, thereby reducing the query response time.
Note: Distributed cache solutions like memcached and Redis are quite popular server-side caching solutions.
Now we know where to place the cache, but the next question is how to identify content for caching and validate it from the server. For that purpose, we have the HTTP caching headers. Let's discuss them.
HTTP caching headers#
HTTP is the core of web APIs and provides cache support. When the server sends the HTTP response to the client, it also sends the cache headers in the response. These headers indicate whether the response can be cached on any caching layer. This section will explore different HTTP cache headers to know which headers are used for what purpose.
HTTP uses headers to set caching policies for the client and intermediate/shared caching devices. When a client sends the first request to any middleware for a resource, the middleware will forward the request to the origin server in case of unavailability to fetch that resource. In return, the server responds with the resource along with caching instructions in the caching header. The illustration below indicates the process in detail.
The illustration above shows that the origin or the web server responds with the resource along with caching instructions in the Cache-Control header. The public and max-age headers indicate that the resource can be cached by both the caching devices and the clients for a specific time period.
Primarily, the caching headers describe policies in the following disciplines:
Cacheability: This describes whether the content can be cached or not. For example, certain content may be cached for a specified amount of time, while others cannot be cached at all.
Scope: The scope describes the possibility of caching the content at a particular caching layer. It may be possible to store some content on the client side but not on the middleware. For example, personalized content prefers to be cached on the client side, and popular public Tweets prefer to be cached on the middleware cache, such as CDNs.
Expiration: As the name suggests, there is a possibility of storing data for a fixed time on a caching layer. In certain cases, this expiration time may be extended.
Validation: Since the expiration of cached content is a norm, it’s important that caching headers allow validation with the origin server and update the content with a new one before fulfilling the incoming requests.
Mainly, the Cache-Control header is used both in client requests and in server responses via some directives. We require some other headers as well to validate the content. Let's discuss headers that support the caching policies.
Caching through HTTP Headers
Policy | Header | Values |
Cacheability |
|
|
Scope |
| |
Expiration |
| |
| ||
Validation |
| It is an identifier for a unique version of a resource that is provided by the origin server. The server can send a newer version if the cache has an outdated copy of the content. Also, the |
| This header is coupled with | |
| The value of this header is in a date-time format that indicates when the content was last modified. The server sends the date and time of the last modification in the resource and the resource in response to the request. Also, this header is used with another caching header (such as | |
| This header has the same purpose as that of |
Note:
EtagandLast-Modifiedboth are useful, butEtagmay provide more precise information about the resource. For example, if the resource is modified frequently within a very short interval, thenLast-Modifiedmight remain the same for the frequent changes (due to unavailability of timestamp with fine-enough granularity). On the other hand,Etagalways generates a unique ID for each change made to the resource's content.
- If the
max-ageandpublicdirectives are given together in theCache-Controlheader, then the content is cached for the number of seconds specified in themax-agedirective on all the caching layers from server to client. If, however,publicis replaced byprivate, then this content is cached only by the client for a given time. - If
max-ageands-max-ageare given together, then the value ofmax-ageis applicable to the client and the value ofmax-ageoverrides the value ofs-max-agefor the shared caches on any middleware or server layer. - If both
EtagandLast-Modifiedare provided by the server, thenIf-None-MatchandIf-Modified-Sinceare sent from the client’s end for validation. The expired content gets updated by the server.
Discussion#
As traffic grows on the Internet, the load on the servers increases day by day. Caching helps us reduce loads on servers or networks and provides low-latency responses to the clients. However, caches have some drawbacks—they are expensive, limited in size, and may contain stale data. Furthermore, caches are not generally suitable for storing dynamic content. Considering the advantages of caching, we need to implement proper cache policies to keep content updated in the cache. Also, caching at all layers might be complex, especially when we want to achieve consistency with the origin server.
Point to Ponder
Question
Should caching be performed on every layer?
The answer depends on a number of things, including the content type. The server defines the policies through its caching headers. For instance, if the content is static, such as logos, CSS, or scripts of the sites, then the server sends a public directive in the Cache-Control header to indicate that the content can be cached on any layer. In contrast, if the content is sensitive, such as the user’s personal information, then the server sends a private directive so that only the client layer will cache the content.
Managing Retries
API Monitoring